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1 Introduction 

Let P be a Markov tranition kernel on a state space X equipped with a count- 
ably generated cr-field X. For a control function / : X ^ [1,00), the f-total 
variation or f-norm of a signed measure ^ on X is defined as 

IImII/ — sup \n{g)\ . 
\g\<f 

When / = 1, the /-norm is the total variation norm, which is denoted ||/i||Tv- 
Assume that P is aperiodic positive Harris recurrent with stationary distri- 
bution TT. Then the iterated kernels P"[x, ■) converge to tt. The rate of con- 
vergence of P'^{x, .) to TT does not depend on the starting state x, but exact 
bounds may depend on x. Hence, it is of interest to obtain non uniform or 
quantitative bounds of the following form 



E 

n=l 



r(n)||P"(a;,-) -ttII/ <5(x) , for all x £ X (1) 



where / is a control function, {r(n)}„>o is a non-decreasing sequence, and g 
is a nonnegative function which can be computed explicitly. 

As emphasized in [RR04, section 3.5], quantitative bounds have a sub- 
stantial history in Markov chain theory. Applications are numerous including 
convergence analysis of Markov Chain Monte Carlo (MCMC) methods, tran- 
sient analysis of queueing systems or storage models, etc. With few exception 
however, these quantitative bounds were derived under conditions which im- 
ply geometric convergence, i.e. r{n) = /3", for some fi > 1 (see for instance 
[MT94], [Ros95], [RT99], [RR04], and [Bax05]). 

Geometric convergence does not hold for many chains of practical inter- 
est. Hence it is necessary to derive bounds for chains which converge to the 
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stationary distribution at a rate r which grows to infinity slower than a geo- 
metric sequence. These sequences are called subgeometric sequences and are 
defined in [NT83] as non decreasing sequences r such that log r(n)/n J, as 
n — > 00. These sequences include among other examples the polynomial se- 
quences r(n) = n'^ with 7 > and subgeometric sequences r(n)e^" with 
c> and (5 e (0,1). 

The first general results proving subgeometric rates of convergence were 
obtained by [NT83] and later extended by [TT94], but do not provide com- 
putable expressions for the bound in the rhs of (1). A direct route to quanti- 
tative bounds for subgeometric sequences has been opened by [Ver97, Ver99] , 
based on coupling techniques. Such techniques were later used in specific con- 
texts by many authors, among others, [FMOO] [JROl] [ForOl] [FM03b]. 

The goal of this paper is to give a short and self contained proof of general 
bounds for subgeometric rates of convergence, under practical conditions. This 
is done in two steps. The first one is Theorem 1 whose proof, based on coupling, 
provides an intuitive understanding of the results of [NT83] and [TT94] . The 
second step is the use of a very general drift condition, recently introduced in 
[DFMS04]. This condition is recalled in Section 2.1 and the bounds it implied 
are stated in Proposition 1. 

This paper complements the works [DFMS04] and [DMS05], to which we 
refer for applications of the present techniques to practical examples. 

2 Explicit bounds for the rate of convergence 

The only assumption for our main result is the existence of a small set. 

(Al). There exist a set C <E X, n constant e > and a probability measure v 
such that, for all x £ C, P{x, •) > ei^(-). 

For simplicity, only one-step minorisation is considered in this paper. Adapta- 
tions to m-step minorisation can be carried out as in [Ros95] (see also [ForOl] 
and [FM03b]). 

Let P be a Markov transition kernel on X x X such that, for all A e A*, 



where A'^ denotes the complementary of the subset A and Q is the so-called 
residual kernel defined, ioi x £ C and A £ X hy 



P{x, x', AxX) = P{x, A)l^cxC)c{x, x') + Q{x, A)lcxc{x, x') (2) 
P{x, x', XxA) = P{x', A)l^cxc)4x, x') + Q{x', A)lcxc{x, x') (3) 





One may for example set 
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P{x,x';A X A') = 

P{x,A)P{x' ,A')l^CxC)4x,x') + Q{x,A)Q{x' ,A)lcxc{x,x') , (5) 

but this choice is not always the most suitable; cf. Section 2.2. For (a;, x') € 
X X X. denote by Fx.x' and Ex.x' the law and the expectation of a Markov 
chain with initial distribution dx ^ Sx' and transition kernel P. 

Theorem 1. Assume (Al). 

For any sequence r & A, 6 > Q and all (x, x') G X x X, 

oo 

5^r(n)||P"(a;,-) -P"(a;',-)||TV < 

n=l 

with M ^ {1 + 5) sup„>o {R*r{n - 1) - e(l - e)5R{n)/{l + 5))^ and R* = 
s^P{v,y')eCxC^v,v' ELi^(fc)]- 

Let : X x X ^ [1, oo) and f be a non-negative function f such that f{x) + 
f{x') < W{x,x') for all {x,x') G X x X. Then, 

OO 

Y,\\P''{x,-)-P^x',-)\\f<Ex,x' 

n=l 

With W* = sup(,,,,)eCxcE,,,' ELi W{Xk,X'^)]. 

Remark 1. Integrating these bounds with respect to 7r(da;') yields similar 
bounds for •) - ttHtv and ||P"(a;, •) - 7r||/. 

Remark 2. The trade off between the size of the coupling set and the constant e 
appears clearly: if the small set is big, then the chain returns more often to 
the small set and the moments of the hitting times can expected to be smaller, 
but the constant e will be smaller. This trade-off is illustrated numerically in 

[DMS05, Section 3]. 

By interpolation, intermediate rates of convergence can be obtained. Let 
a and (3 be positive and increasing functions such that, for some < p < 1, 

a{u)(i{v) <pu+{l- p)v , for all (u, w) € M+ x R+ . (8) 

Functions satisfying this condition can be obtained from Young's inequality. 
Let -0 be a real valued, continuous, strictly increasing function on M+ such 
that V(0) = 0; then for all a, 6 > 0, 

pa rb 

ab <\P (a) +${b) ,wh.eve !Z'(a) = / 4>{x)dx and ${b) = / 4>~''-{x)dx , 

Jo Jo 

where tp^^ is the inverse function of ijj. If we set a{u) = ^~^{pu) and (5{v) = 
— p)v), then the pair (a,/3) satisfies (8). A trivial example is obtained 
by taking "0(2;) = x^^^ for some p > 1, which yields a{u) = (ppuY^P and 
f3{u) = {p{l — p)u/{p — l))(p-i)/3'. Other examples are given in Section 2.1. 



.k=0 



'-M, (6) 



.fe=0 



(7) 
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Corollary 1. Let a and (3 he two positive functions satisfying (8) for some 
< p < 1. Then, for any non-negative function f such that f{x) + f{x') < 
/3 o W{x, x') and 6 > Q, for all x,x' gX and n> 1, 



J2 a{r{n))\\P-{x, •) - OH/ < p(l + S)E,,. 

1 - e 



+ (1 - p)E^,^ 



.fe=o 



,fe=o 

{pM + {1 - p)W*} . (9) 



2.1 Drift Conditions for subgeometric ergodicity 

The bounds obtained in Theorem 1 and Corollary 1 are meaningful only if 
they are finite. Sufficient conditions are given in this section in the form of 
drift conditions. The most well known drift condition is the so-called Foster- 
Lyapounov drift condition which not only implies but is actually equivalent 
to geometric convergence to the stationary distribution, cf [MT93, Chapter 
16]. [JROl], simplifying and generalizing an argument in [FMOO], introduced 
a drift condition which implies polynomial rates of convergence. We consider 
here the following drift condition, introduced in [DFMS04], which allows to 
bridge the gap between polynomial and geometric rates of convergence. 
Condition D{(j),V,C): There exist a function F : X — > [l,oo], a concave 
monotone non decreasing differentiable function (() : [l,oo] i— > (0, oo], a mea- 
surable set C and a constant b> such that 

PV + <j)oV <V + blc. 
If the function <p is concave, non decreasing and differentiable, define 

dx 



(10) 



Then is a non decreasing concave differentiable function on [1, oo). More- 
over, since (j) is concave, (j)' is non increasing. Hence (j}{v) < + (})' {l){v — 1) 
for all > 1, which implies that increases to infinity. We can thus define 
its inverse H'^^ : [0, oo) [1, oo), which is also an increasing and differen- 
tiable function, with derivative {H'^^)'{x) = (j) o H^^{x). For A; G N, ^ > 
and V > 1, define 



r^{z) := {H7'y{z) = ct>oH7\z). 



(11) 



It is readily checked that if limt^oo ?!>'(*) = 0, then e A, cf [DFMS04, 
Lemma 2.3]. 

Proposition 2.2 and Theorem 2.3 in [DMS05] show that the drift condition 
D((/), V, C) implies that the bounds of Theorem 1 are finite. We gather here 
these results. 
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Proposition 1. Assume that Condition D((/), V, C) holds for some small set 

C and thai inf^.^^ (j) ° ^{^) > b. Fix some arbitrary A G (0, 1 — 6/ inf^;^^ (j) o 
V{x)) and define W{x,x') = \(t>{V{x) + V{x') - 1). Define also V* = {I - 
e)^^ supj^gP {PV{y) — ev{V)}. Leta be the hitting time of the setCxC. Then 



.fe=o 



E 

.fe=0 



W{Xk,Xi) < sup Wiy,y') + {Vix) + V{x')}l^,^,,)^cxC, 

iy,v')eCxC 

R* <1 + {2y* - 1} 

W* < sup W{y, y') + 2V* -1. 

{y,y')eCxC 

Remark 3. The condition inf^^^ 0o > b may not be fulfilled. If level sets 
{V < d} are small, then the set C can be enlarged so that this condition holds. 
This additional condition may appear rather strong, but can be weakened by 
using small sets associated to some iterate of the kernel (see e.g. [Ros95], 
[ForOl] and [FM03b]). 

We now give examples of rates that can be obtained by (11). 
Polynomial rates 

Polynomial rates of convergence arc obtained when Condition 0(0, V,C) 
holds with (f>{v) = cv" for some a £ [0,1) and c G (0,1]. The rate of con- 
vergence in total variation distance is r(j,{n) oc n"/'-^^"-' and the pairs (r, /) 
for which (9) holds are of the form (n(i-p)«/(i-"), y"P) for p e [0, 1], or in 
other terms, (n'"-\ l/^-'^fi-")) for 1 < k < 1/(1 - a), which is Theorem 3.6 
of [JROl]. 

It is possible to extend this result by using more general interpolation 
functions. For instance, choosing for > 0, a{x) = (1 V log(x))'' and [3{x) — 
x{l V \og{x))-^ yields the pairs (n(^-p)»/(^-») \og\n), V"p(1 + \ogV)-^), for 
[0,1]. 

Logarithmic rates of convergence 

Rates of convergence slower than any polynomial can be obtained when con- 
dition D((/<, V, C) holds with a function that increases to infinity slower than 
polynomially, for instance (j){v) = c(l + log(^;))" for some a > and c S (0, 1]. 
A straightforward calculation shows that 

r^{n) X log"(n) . 

Pairs for which (9) holds are thus of the form ((1 + log(n))^^~^^", (1 + 
log(F))f"). 
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Subexponential rates of convergence 

Subexponential rates of convergence faster than any polynomial are obtained 
when the condition D((/), V, C) holds with (j> such that v/(j>{v) goes to infinity 
slower than polynomially. Assume for instance that cj) is concave and differen- 
tiable on [1, +00) and that for large v, (j>{v) = cv/ log"(f ) for some a > and 
c > 0. A simple calculation yields 

r^in) X n-«/(i+") exp ({c(l + a)n}i/(i+"') . 

Choosing a{x) = x^-p{1 V log(x))-'' and = xP{1 V log(a;))'' for p G (0, 1) 
and 6 e M; or p = and 6 > 0; or p = 1 and b < —a yields the pairs 

^-(a+6)/(i+a) J^^i _ p^i^^i ^ a)n}i/(i+")) , VP{1 + log Vf . 



2.2 Stochastically monotone chains 



Let X be a totally ordered set and let the order relation be denoted by ^ 
and for a G X, let (—00, a] denote the set of all a; G X such that x ^ a. A 
transition kernel on X is said to be stochastically monotone if x ^ y implies 
P(.T, (—00, a]) > P{y, (—00, a]) for all a G X. If Assumption (Al) holds, for a 
small set C = (— oo,ao], then instead of defining the kernel P as in (5), it is 
convenient to define it, for x,x' &X and A & X ^ X,hy 

P{x,x';A) = l(^^^^,)^cxc / lA{P^{x,u),P^{x',u))du 

Jo 

+ lcxc{x,x') / lA{Q^ix,u),Q^{x',u))du , 
Jo 

where, for any transition kernel K on X, K^{x, •) is the quantile function of 
the probability measure -FC(.t, ■), and Q is the residual kernel defined in (4). 
This construction makes the set {{x,x') G X x X : x < x'} absorbing for P. 
This means that if the chain {Xn,X'^) starts at {xo,x'q) with xq < x'q, then 
almost surely, X„ < X'^ for all n. Let now ac and acxc denote the hitting 
times of the sets C and C x C, respectively. Then, we have the following 
very simple relations between the moments of the hitting times of the one 
dimensional chain and that of the bidimensional chain with transition kernel 
P. For any sequence r and any non negative function V ail x < x' 



^ r{k)V{X^,X'^) 



k=0 



Y.r{k)V{X',) 



.k=0 



A similar bomid obviously holds for the return times. Thus, there only re- 
main to obtain bounds for this quantities, which is very straightforward if 
moreover condition D((/), V, C) holds. Examples of stochastically monotone 
chains with applications to queuing and Monte-Carlo simulation that satisfy 
condition D((/), V, C) are given in [DMS05, section 3]. 
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3 Proof of Theorem 1 

Define a transition kernel P on the space X = XxXx{0,l} endowed with 
the product cr-field X, for any x,x' gX and A, A' G X, by 

P {{x, x', 0), AxA' X {0}) = {1 - elcxcix, x')}P{ix, x'), AxA'), (12) 
P {{x, x', 0), AxA'x {!}) = elcxcix, x')v^^^,{A n A') , (13) 
P {{x, x', 1), AxA'x {1}) = P{x, An A'). (14) 

For any probability measure jl on {X,X), let be the probability measure 
on the canonical space (X'^, X'^^) such that the coordinate process {Xk} is a 
Markov chain with transition kernel P and initial distribution jl. The corre- 
sponding expectation operator is denoted by E/j. 

The transition kernel P can be described algorithmically. Given Xq = 
{Xo,XQ,do) = {x,x',d), Xi = {Xi, X[,di) is obtained as follows. 

• If d = 1 then draw Xi from P{x, •) and set X[ = Xi, di = 1. 

• If d = and {x, x') G C x C, flip a coin with probability of heads e. 

- If the coin comes up heads, draw Xi from u^^x' and set X{ = Xi and 
di = 1. 

- If the coin comes up tails, draw {Xi,X{) from P{x,x';-) and set di = 0. 

• If d = and {x,x') ^ C x C, draw {Xi,X[) homP{x,x'; ■) and set di = 0. 

The variable dn is called the bell variable; it indicates whether coupling has 
occurred by time n {dn = 1) or not (d„ = 0). The first index n at which 
d„ = 1 is the coupling time; 

T = inf{fc > 1 ; 4 = 1}- 

If d„ = 1 then X^ = X'^. for all fc > n. This coupling construction is carried 
out in such a way that under V^^^/^So^ {^k} and {Xj,} are Markov chains 
with transition kernel P with initial distributions ^ and ^' respectively. 

The main tool of the proof is the following relation between E^^x'fl and 
^x,x', proved in [DMR04, Lemma 1]. For any non- negative adapted process 
(Xfe)fe>o and {x,x') e X x X, 

^x^x'AXnl{T>n}] = [Xn (1 " e)^-^] , (15) 

where Nn = J2^=o ^Cxc{Xi,X-) is the number of visits to C x C before 
time n. 

We now proceed with the proof of Theorem 1. 
Step 1 Lindvall's inequality [Lin79, Lin92] 



oo 



Y^r{3) {/(X,)+/(Xj)} 



(16) 
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Proof. For any measurable function <f) such that \(f)\ < f, and for any (x, x') € 
X X X it holds that 

- P''ct>{x')\ = |t,,,,,o[{.^(Xfe) - </.(X^)}l{,,^o}] 
<t,,.'fi[{fm + f{Xi)}^T>k}]. 

Hence ||P'=(x, • • • ) - PH^', Oil/ < K-'AifW + f{X'k)}l{T>k}]- Summing 
over k yields (16). □ 

Step 2 Denote Wrj{x,x') = i^x^x' ['EUor{k)f{Xk, X'^)] and W*{rJ) = 
suP(.,.oeCxC ELi r{k)f{Xk,Xi)] /r(0). Then 



E. 



'x,x',0 



T-l 



^r(fc)/(X,,X^) 



.fe=0 



< W,j(x,x') + e-'(l-e)W;*^E,,,,,o[r(r-l)] . (17) 
Proof. Applying (15), we obtain 

TT-l 



^r(/c)/(Xfe,X;) 



,fe=o 



^E,,,,,o [r{k)f{Xk,Xl^)l^T>k}] 



fc=o 



= ^E,,,, [r{k)f{X,,X',){l-ef''-^] 



oo oo 



= EE(i - ^)'^-.-' [Kfc)/(^fe,^fe)i{iv,_,=,-}] 

oo oo 

= Wr,f{x,x') + EE(1 - ^)'^-.-' Kfc)/(^fe,^Dl{iV,_,=,-}] 
j=l /s=0 

For j > 0, let aj denote the {j + l)-th visit to C x C. Then Nk-i = j iff 
<Tj-i < k < (jj. Since r is a subgeometric sequence, r(n+m) < r(n)r(m)/r(0), 
thus 



^r(A:)/(X,,X^)l{^,_,=,.j= ^ r(&)/(Xfe,XO 

k—0 fc— fTj_i+l 



= ^ r(a,_i+fc)/(X,,X^) 



fc=i 



< 



r(0) 



^ r(fc)/(Xfc,X^) 



fe=i 



Applying the strong Markov property yields 
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T-l 



.fe=0 



< Wrjix,x') 

OC 

+ (1 - e)W*{f,g)J2i^ - eyt,,,,[r{aj)] . 
By similar calculations, (15) yields 

oo 

E[r(T-l)] = e^(l-e)^E[r(a,)], 

which concludes the proof of (17). □ 
Step 3 Applying (17) with r = 1 yields (7). 

Step 4 Ifr G A, then lini„^oo r{n)/R{k) = 0, with R{Q) = 1 and R{n) = 
X^^Zq r{k), n>l. Thus we can define, for r € and ^ > 

Ms = {l + 5) sup {e-\l - e)W;^r{n - 1) - 5R{n)/{l + 5)}^ . 

Ms is finite for all 5 > 0. This yields 

E,.,x',o[ii(T)] < (1 + 5)Wr,i{x,x') + Ms . 
Applying this bound with (16) yields (6). □ 
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